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Hierarchical modeling is wonderful and here to stay, but hyper- 
parameter priors are often chosen in a casual fashion. Unfortunately, 
as the number of hyperparameters grows, the effects of casual choices 
can multiply, leading to considerably inferior performance. As an ex- 
treme, but not uncommon, example use of the wrong hyperparameter 
priors can even lead to impropriety of the posterior. 

For exchangeable hierarchical multivariate normal models, we first 
determine when a standard class of hierarchical priors results in 
proper or improper posteriors. We next determine which elements of 
this class lead to admissible estimators of the mean under quadratic 
loss; such considerations provide one useful guideline for choice among 
hierarchical priors. Finally, computational issues with the resulting 
posterior distributions are addressed. 

1. Introduction. 



1.1. The model and the problems. Consider the block multivariate nor- 
mal situation (sometimes called the "matrix of means problem") specified 
by the following hierarchical Bayesian model: 
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where the Xj are k x 1 observation vectors, k>2, the 0j are fe X 1 unknown 
mean vectors, /3 is a x 1 unknown "hyper-mean" vector and V is an 
unknown p x p "hyper-covariance matrix." This is more commonly written 
as, for % = 1, 2, . . . , m and independently, Xj ~ Nk(9i, I), 0j ~ iVfc(/3, V). Note 
that p = m/c. Efron and Morris [16, 17] introduced the study of this model 
from an empirical Bayes perspective. Today, it is more common to analyze 
the model from a hierarchical Bayesian perspective (cf. [2, 18]), based on 
choice of a hyperprior 7r(/3,V). Such hyperpriors are often chosen quite 
casually, for example, constant priors or the "nonhierarchical independence 
Jeffreys prior" (see Section 1.2) |V[-( fc+1 )/ 2 . In this paper we formally study 
properties of such choices. 

The first issue that arises when using improper hyperpriors is that of 
propriety of the resulting posterior distributions (cf. [38]). In Section 2 we 
discuss choices of 7r(/3,V) which yield proper posterior distributions. That 
this is of importance is illustrated by the fact that we have seen many in- 
stances of use of | "V | — (^H-i)/2 £ Qr s i m ij ar situations, even though it is known to 
generally yield an improper posterior distribution when used as a hyperprior 
(see Section 2). 

A more refined question, from the decision-theoretic point of view, is that 
of choosing hyperpriors so that the resulting Bayes estimators, for a specified 
loss function, are admissible. The particular version of this problem that we 
will study is that of estimating 6 by its posterior mean S n (x), under the 
quadratic loss 



where Q is a known positive-definite matrix. The performance of an estima- 
tor S will be evaluated by the usual frequentist risk function 



The estimator S is inadmissible if there exists another estimator with risk 
function nowhere bigger and somewhere smaller. If no such better estimator 
exists, d is admissible. 

In Section 3 conditions on ir(/3, V) are presented under which the Bayes 
estimator S n is admissible and inadmissible. The motivation for looking 
at this problem is not that this specific decision-theoretic formulation is 
necessarily of major practical importance. The motivation is, instead, that 
use of "objective" improper priors in hierarchical modeling is of enormous 
practical importance, yet little is known about which such priors are good or 
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bad. The most successful approach to evaluation of objective improper priors 
has been to study the frequentist properties of the ensuing Bayes procedures 
(see [3] for discussion and many references). In particular, it is important 
that the prior distribution not be too diffuse, and study of admissibility is 
the most powerful tool known for detecting an over-diffuse prior. Also see 
[10] for general discussion of the utility of the decision-theoretic perspective 
in modern statistical inference. 

The results in the paper generalize immediately to the case where the 
identity covariance matrix I for the Xj is replaced by a known positive- 
definite covariance matrix, but for notational simplicity we only consider 
the identity case. More generally, the motivation for this study is to obtain 
insight into the choice of hyperpriors in multivariate hierarchical situations. 
The possibilities for normal hierarchical modeling are endless, and it is barely 
conceivable that formal results about posterior propriety and admissibility 
can be obtained in general. The hope behind this study is that what is 
learned in this specific multivariate hierarchical model can provide guidance 
in more complex hierarchical models. 

1.2. The hyperprior distributions being studied. We will study hyperprior 
densities of the form 

7r(AV)=7r(/3)7r(V). 

For V, we will study priors that satisfy the following condition, where d\ > 
d 2 > • • • > dk > are the eigenvalues of V. 

Condition 1. For < I < 1, 

Ci 



|I + V\(a 2 - ai ) | V | 01 [H^di - dj)}^' 1 ) 

<vr(V) < 



c 2 



|I + Y\(a 2 - ai ) | V |ai [Yl^di - dj)]^- 1 ) ' 



where C\ and C 2 are positive constants and |A| denotes the determinant of 
A. Many common noninformative priors satisfy this condition, including: 

Constant prior. vr(V) = 1; here a\ = a 2 = and 1 = 1. 

Nonhierarchical independence Jeffreys prior. vr(V) = |V| — ( fe + 1 )/ 2 ; here 
o-i = a 2 = (k + l)/2 and I = 1. 

Hierarchical independence Jeffreys prior. 7r(V) = |I + V|~( fc+1 )/ 2 ; here 
ai =Q,a 2 = (k + l)/2 and I = 1. 
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Nonhierarchical reference prior. vr(V) = [|V| Yli < j(di — dj)] 1 ; here a\ = 
a 2 = 1 and 1 = 0. (See [40].) 

Hierarchical reference priors. 

(a) tt(V) = [|I + V|ni<y(di-dj)] -1 ; here 01 = 0,02 = 1 and Z = 0. 

(b) vr(V) = [IVI-^- 1 )/^) n i<3 -(di - dj)]' 1 ; here ai = a 2 = (2k - l)/(2k) 
and I = 0. 

We have already alluded to the nonhierarchical independence Jeffreys 
prior, which formally is the Jeffreys prior for a covariance matrix in a non- 
hierarchical setting with given mean. Unfortunately, this prior seems to be 
commonly used for covariance matrices at any level of a hierarchy, typically 
yielding improper posteriors, as will be seen in Section 2. Those who rec- 
ognize the problem often instead use the constant prior, or the hierarchical 
independence Jeffreys prior, which arises from considering the "marginal 
model" formed by integrating over j3 in the original model and computing 
the independence Jeffreys prior for this marginal model. 

Similarly, the nonhierarchical reference prior yields an improper posterior 
in hierarchical settings (shown in Section 2). The two versions of hierarchical 
reference priors given above arise from quite different perspectives. Prior 
(a) arises from considering the marginal model formed by integrating over 
(3 in the original model, and applying the Yang and Berger [40] reference 
prior formula to the covariance matrix I + V that arises in the marginal 
model. (The differences of eigenvalues for this matrix are the same as the 
differences of the eigenvalues for V.) Prior (b) arises from a combination 
of computational and admissibility considerations that are summarized in 
Sections 1.5 and 1.6, respectively. 

Note that if the covariance matrix for the Xj were a known XI, instead 
of I, then I in the above priors would be replaced by It could not then 
be said, however, that the reference prior formula is that which would re- 
sult from applying the Yang and Berger [40] reference prior formula to the 
covariance matrix XI + V that arises in the marginal model, since the dif- 
ferences of eigenvalues of this matrix will no longer equal the differences of 
the eigenvalues of V. 

Three commonly considered priors for the hyperparameter (3 are: 

Case 1. Constant prior. vr(/3) = 1. 

Case 2. Conjugate prior. vr(/3) is Nk((3°, A), where /3° and A are sub- 
jectively specified. 
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Case 3. Hierarchical prior. 7r(/3) is itself given in two stages: 

(4) /3|A~iV fe (/3 ,AA), A~^(A),A>0, 
where (3 and A are again specified, and 7r(A) satisfies: 

Condition 2. 

(i) J c 7r(A)(iA < oo for c> 0; 

(ii) 7r(A) ~ C\~ h (b > 0) as A — > oo for some constant C > 0. 

As discussed in [7], an important example of a Case 3 prior is obtained 
by choosing 

7r(A) oc A~ b e~ c / A , 

that is, an inverse Gamma(6— l,c _1 ) density. This clearly satisfies Condition 
2, and the resulting prior for (3 is 

, r 1 i-(k/2+b-l) 

tt(/3) = y 7r(/3|A)7r(A)dA«^l + -(/3-/3 )'A- 1 (/3-/3°)j 

which is a multivariate i-distribution with median /3°, scale matrix pro- 
portional to A and 2(6 — 1) degrees of freedom. We will be particularly 
interested in the improper version of this prior with c = 1/2, /3° = 0, A = I 
and b = 1/2, corresponding to 

(5) H3Wi+ii/3ii 2 r ( ^ 1)/2 - 



1.3. Related literature. Hierarchical Bayesian analysis has been widely 
applied to many theoretical and practical problems (cf. [8, 11, 21, 23]). 
Results and many references to decision-theoretic analysis of hierarchical 
Bayesian models can be found in [5, 6, 7, 18, 31]. Reference [7] considered the 
following hierarchical normal model: X = {X\ , X2, ■ ■ ■ , X P Y ~ N p (9, S), with 
S being a known positive-definite matrix. The paper considered the com- 
mon two-stage prior distribution for 6 given by 6 ~ N p (f31, <J^I), ~ 
7ri(cr^)7T2(/3), where 1 is the p- vector of l's and I is the identity matrix, and 
presented choices of vri(cT^) and 7T2(/3) which yield proper posteriors and ad- 
missible Bayes estimators under quadratic loss. This is thus the special case 
of our model where k = 1, and this paper can be viewed as an extension 
of those results to the vector mean problem (and, hence, our restriction to 
k>2). 

The more general decision-theoretic background of this paper is the huge 
literature on shrinkage estimation, initiated by the demonstration in [33] 
that the usual estimator for the mean of a multivariate normal distribution 
is not admissible when p > 3. This huge literature can be accessed from, for 
instance, [36]. 
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The key to the admissibility and inadmissibility results presented in this 
paper is the fundamental paper [9], which provided the crucial insight to 
allow determination of admissibility and inadmissibility of Bayes estimators. 

1.4. A transformation and revealed concern. It is convenient, for both 
intuitive and technical reasons, to write V = H*DH, where H is the matrix 
of eigenvectors corresponding to D = diag(c?i , d,2, ■ ■ ■ , d^j , such that H*H = I. 
Indeed, we will make the change of variables from V to (D, H), and rewrite 
the prior as 

vr(V) dV = tt(H, D)/ [dl>d2> ... >dfc] dV dU, 

here dV = Ili<j dVij, dD = Yli=i ddi, dH. denotes the invariant Haar mea- 
sure over the space of orthonormal matrices and I\d 1 >d 2 >-- >d k ] denotes the 
indicator function over the specified set. (Because of Condition 1, equality 
of any eigenvalues has measure 0.) 

From [19], the functional relationship between vr(V) and 7r(H,D) is 

vr(H, D) = ^(H'DH) Y[ {di -dj). 

i<j 

Thus Condition 1 becomes 
Condition 1'. For < I < 1, 

ci[iw<fr-rf 3 -)r" ^ mn ^^tty*-i 

-, r; — ; — ^ vrlrl, L) < -, r; — ; — • 

I + DK a2_ai ) D ai — |I + D|( a2_ai )|D| ai 

Under this transformation, the common objective priors for V are as 
follows: 

1. The constant prior is now 7r(H, D) = rii<j(^ — dj). 

2. The nonhierarchical independence Jeffreys prior is 7r(H, D) = | H> | (fc+i)/2 x 
Ui<j(di ~dj). 

3. The hierarchical independence Jeffreys prior is vr(H, D) = |I + H) | — ( fc+1 )/ 2 x 
Yli<j(di-dj). 

4. The nonhierarchical reference prior is 7r(H,D) = |D| — 1 . 

5. The hierarchical reference priors are (a) 7r(H,D) = |I + D| _1 and (b) 7r(H,D) = 

| D |-(2fc-l)/(2fc)_ 

This transformation reveals a significant difficulty of any prior that can 
be written as a function of |V|: in the (H,D) space, such priors contain the 
factor rij<i(^ ~ dj), which gives low mass to close eigenvalues, and hence 
effectively forces the eigenvalues apart. (The effective prior on H is just 
constant, which is natural since H ranges over a compact space, and hence 
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has no effect on the eigenvalues.) This is contrary to common intuition, in 
that one is often debating between choice of a covariance matrix with equal 
eigenvalues or choice of an arbitrary covariance matrix; if anything this 
would suggest that one should choose a prior that pushes the eigenvalues 
closer together. 

This intuition also receives support from the frequentist literature. The in- 
dependence Jeffreys prior (and often-employed modifications such as |I + V|) 
are of this suspicious form and, when used at the first level of a normal model, 
result in estimates of V that are proportional to S, the sample covariance 
matrix. The frequentist literature, starting with [34] and continuing with 
such works as [22, 26, 27, 30, 40], shows that S has eigenvalues that are too 
disperse and that shrinking the eigenvalues of S together is necessary for 
good performance. Since multiples of S arise as Bayes estimators for priors 
of the "suspicious" form, there appears to be a direct analogy between what 
frequentists observed about S and the concern that these priors force the 
eigenvalues apart. 

In contrast to this behavior, the reference and hierarchical reference pri- 
ors do not contain the term Y\i<j{di — dj) in the transformed space, and 
hence are neutral with respect to expansion or shrinkage of the eigenvalues. 
Interestingly, in [40] (see also [37]), it is shown that the Bayes estimators 
arising from the reference prior (in the nonhierarchical model) behave very 
similarly to the Stein [34] and Haff [22] estimators, suggesting that such neu- 
tral behavior is natural for frequentist estimators — that is, that shrinking 
the eigenvalues of S corresponds to a Bayesian prior that is neutral about 
the eigenvalues. (It should be noted that, in the more recent Bayesian lit- 
erature, aggressive shrinkage of eigenvalues, correlations or other features 
of the covariance matrix is entertained; cf. [14, 15, 25] and the references 
therein. This may well be desirable in many practical situations, but is more 
aggressive in its prior assumptions than the objective priors we consider.) 

1.5. Computation. Hierarchical models are typically handled today by 
Gibbs sampling, possibly with rejection or Metropolis-Hastings steps in the 
Gibbs sampler (cf. [12, 32]). We briefly indicate considerations in utilizing 
the priors discussed in Section 1.2 within such computational frameworks. 

Use of the Case 1 (constant) or Case 2 (normal) priors for (3 causes no 
difficulties; sampling of (3 can simply be carried out with a Gibbs step, as 
its full conditional will be a normal distribution. The Case 3 prior is almost 
as easy to utilize, because of its representation as a mixture of normals. 
Indeed, one purposely introduces the latent variable A having the density in 
(4) ; sampling of /3 is then done from its full conditional — also given A — which 
is normal, with A then being sampled from its full conditional 
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that is, an inverse Gamma(6 - 1 + k/2, [c + \{(3 - j3°)*A -1 G3 - /3 )]" 1 ) 
density. In particular, the recommended default hyperprior 7r(/3) oc [1 + 
||/3j| 2 ]~( fc_1 )/ 2 is handled as above, sampling A from the inverse Gamma((fc — 
l)/2,2/[l + ||/3|| 2 ]) density 

Dealing with the hyper-covariance matrix V is not as easy (cf. [13]), 
except for the constant prior vr(V) = 1, for which the full conditional of V 
is simply an inverse Wishart distribution; alas, this is not a desirable prior 
in other respects. More attractive is the hierarchical independence Jeffreys 
prior tt(V) = |I + V|-( fc+1 )/ 2 . Defining W(0,/3) = YT=i(°i ~Wi the 
resulting full conditional for V can be written 

*W 9 >® K ^TW Ff Wp^ exp ("^ tr(v_lw(0 ' /3)) )' 

which, unfortunately, is not of closed form. Still, one can easily sample from 
this full conditional using the following accept-reject sampling algorithm: 
Propose a candidate V* from the inverse Wishart (m, W(0,/9)) density 

(6) 9(V\S) oc |y|m/2 + (fc+1)/2 •exp(-itr(V- 1 W(0,/3))). 

Accept the candidate with probability P = (|V|/|I + V|)( fc+1 )/ 2 , returning 
to the proposal step if the candidate is rejected, and moving on to another 
full conditional if it is accepted. 

For large V or m or small dimension k, the acceptance probability will 
be quite high. 

When using the hierarchical independence Jeffreys prior, one can gain 
efficiency by working with the marginal distribution of V, instead of the full 
conditionals. This is particularly convenient in the Case 1 scenario, where 
the overall posterior distribution can be written tt(0\V, x)7r(V|x), the first 
posterior being a normal distribution, and hence trivial to sample from, and 
the marginal posterior of V being proportional to the integrand in the first 
expression of Lemma 2.1, namely 

1 / 1 m 

k ii + vK^)/ 2 exp r 2 ^ (Xi " x)< (I + vr 1 (Xi " x) 

I ' \ i=l 

As discussed in [18] (although they utilized the constant prior for V), one 
can construct a rejection sampler for V by simply generating B = (I + V*) 
from the inverse Wishart (m + /c,X)ELi( x i — x)(xj — x)*) density, accepting 
the candidate V* = B — I if it is positive definite and returning to generate 
a new B if it is not. This will have a reasonable acceptance probability if V 
is large or m is large. 

For the hierarchical reference priors, it seems that Metropolis-Hastings 
must be used to sample from the full conditionals. The "standard" approach 
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is that utilized in [40] and [28]. In this approach one first performs the 
exponential matrix transform of V, which translates the set of positive- 
definite matrices into unconstrained Euclidean space. Then a hit-and-run 
Metropolis-Hastings algorithm is employed to produce the Markov chain. 
This algorithm can be directly utilized here, requiring only the change in 
the acceptance probability induced by using the hierarchical reference priors 
instead of the nonhierarchical reference priors. 

Since a Metropolis-Hastings step is required anyway for the hierarchical 
reference priors, one can again gain efficiency by working with the marginal 
distributions of V, instead of the full conditionals. Taking the Case 3 situ- 
ation for illustration, one uses the posterior form 

7r(0|/3,V,x)7r(/3|V,A,x)7r(V,A|x), 

where the first two posteriors are simply normal distributions, and hence 
trivial to sample, and the marginal posterior of (V, A) is proportional to the 
integrand in the first expression of Lemma 2.3, that is, 

n ^' X ^ a |I + V|(— D/ 2 |I + V + mAA|i/2 

x exp (-1 f>, - x)'(I + V)- 1 ^ " *)) 

x exp ^-^mx*(I + V + mAA)~ 1 x^7r(V)7r(A). 

One proceeds by applying the exponential matrix transform to V and then 
running a hit-and-run algorithm for the transformed V and A. For each 
(V, A) in the chain (or probably better — for, say, every 100th in the chain) 
one can then generate (3 from the normal 7r(/3|V,A,x) and then from the 
normal ir(6\/3, V, x). 

If one wishes to stick to Gibbs sampling for the hierarchical reference 
priors (as would be the case, e.g., if one were working with a complex model 
for which marginalization could not be carried out), and further desires 
an easy-to-code algorithm, one could use Metropolis-Hastings on the full 
conditional for V with the proposal in (6). (For justification as to why this 
is the best inverse Wishart proposal, see [39].) The acceptance probabilities 
for the (a) and (b) versions of the hierarchical reference prior would then 
be, respectively, 

. r IWff-ffi |i + v*||vp+i)/^ 

IW*-<*j) ' |I + V||V*|(*+iW' 

rw^-d*) ivi^+fc- 1 )/ 2 



min< 1 



IWdi-di) |V*|(*- 1 +*- 1 )/a 
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Note, also, that it is generally best to iterate a number of times on this 
Metropolis step, keeping only the last value, before moving on to another 
full conditional (as this step is considerably less efficient than the others). 

For small k or large m, this simple approach will work reasonably well. 
For instance, in a simulation reported in detail in [39], the average number 
of Metropolis iterations before a move occurred was as indicated in Table 
1. Since the proposal moves widely over the parameter space, a Metropolis 
scheme that moves at least once in every 10 iterations is often acceptable; 
thus, for m < 30, one can use this algorithm to do the calculation with k up 
to 7. With larger m, such as 100, the algorithm is still acceptable for k = 15. 
When this scheme is not efficient enough, the exponential matrix transform 
hit-and-run approach mentioned above has proven to be very effective (but 
harder to program). 

1.6. Summary and generalizations. The results in the paper require sig- 
nificant technical machinery. This machinery is not necessary for under- 
standing the basic conclusions, so we present the most important conclu- 
sions and potential generalizations here. Note that the conclusions depend 
on intuitive appeal (e.g., Section 1.4), posterior propriety (Section 2), ad- 
missibility (Section 3) and computational simplicity (Section 1.5). 

None of the priors on (3 significantly affects posterior propriety, or caused 
difficulties in the posterior computation. Hence admissibility is the most 
important criterion for deciding between them. It seems that use of the con- 
stant prior vr(/3) = 1 results in inadmissibility, except for the case k = 2. 
(This is, of course, not a surprise, in that two dimensions is typically the 
cut-off for admissibility with constant priors on means.) The Case 2 conju- 
gate prior is, perhaps, reasonable, if one has subjective information about 
[3. Among the Case 3 default priors, the prior 7r(/3) oc [1 + H/JH 2 ] - ^ 1 )/ 2 
is excellent from the perspective of admissibility for all k, and is the prior 
that we actually recommend for default use. Part of the motivation here 

Table 1 
Average number of nonmoves 



rn 



k 


20 


30 


50 


100 


3 


6.89 


4.92 


2.14 


1.06 


5 


9.83 


5.74 


2.96 


1.21 


7 


13.52 


8.50 


4.03 


2.27 


10 


18.74 


10.86 


5.42 


3.46 


12 


33.67 


19.63 


7.61 


5.07 


15 


127.35 


42.98 


17.89 


9.36 
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is the many studies that have shown the great success of these mixture- 
of- normals priors in shrinkage estimation in particular (cf. [1, 20, 35]), and 
robust Bayesian estimation in general (cf. [4]). There is the caveat, however, 
that this prior should probably only be applied when the are roughly 
"exchangeable," which might well require some reparameterization to en- 
sure. Note that we even recommend use of this prior when k = 2. It is often 
thought that shrinkage should only be used when k > 3, but it can be used 
to practical advantage even when k = 2 (even though there are no longer 
uniform dominance results). 

Of considerably more importance than the prior on f3 is the prior on V. 
The two priors for V that we have seen most commonly used in practice are 
the constant prior (or, equivalently, a "vague proper inverse Wishart" prior) 
and the nonhierarchical Jeffreys prior (or a vague proper inverse Wishart 
equivalent). Use of the nonhierarchical Jeffreys prior is simply a mistake, in 
that it results in an improper posterior (and use of the vague proper inverse 
Wishart equivalent is no better, in that it essentially yields a posterior with 
almost all its mass in a spike near V = 0). The constant prior requires m, 
the number of blocks, to be about 2k in order to achieve posterior propriety. 
Intuitively, at most k blocks are needed for identifiability of V, so this is a 
strong indication of the inadequacy of the constant prior. In this regard, the 
hierarchical independence Jeffreys prior 7r(V) = |I + V|~ ( - fc+1 '/ 2 requires only 
k blocks (k + 1 if the constant prior on j3 is used) for posterior propriety. 

We were not able to establish any admissibility results for these priors, 
but Tatsuya Kubokawa (private communication) has been able to show by 
different techniques that the I = prior results in inadmissibility for Case 1 
when a\ = and a>2 < 1 + k/2 — 1/k and, for the special case of Case 2 of 
known /3, when a\ = and a>2 < (k + l)/2 — l/k. Since the constant prior on 
V is a\ = a,2 = 0, this clearly shows that the constant prior is badly inad- 
missible (i.e., is far from the boundary of admissibility). Kubokawa's results 
do not settle the question of admissibility of the hierarchical independence 
Jeffreys prior. 

Either the constant prior or the hierarchical independence Jeffreys prior 
is easy to handle computationally so, if computational ease is the primary 
concern, our recommendation would be to use the hierarchical independence 
Jeffreys prior. As mentioned earlier, however, it is not immediately obvious 
how to generalize this prior to other hierarchical settings, although replacing 
I by the covariance matrix from the lower level is a good general solution 
when the lower level has an exchangeable structure. 

The two proposed hierarchical reference priors, (a) 7r(V) = [|I + V| Y\.i<ji.di ~~ 
dj)]~ l and (b) tt(V) = [j V|-C2fc-i)/(2fc) j]^.^ - dj)]' 1 , are very appealing. 
They always result in proper posteriors if m > 2, a practically very useful 
and surprising fact when m <k (explained in Section 2.2 A). They also both 
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yield admissible (or nearly admissible) estimators in Cases 2 and 3, and are 
computationally of similar complexity Choice (a) is an actual hierarchical 
reference prior, in that it can be derived by a reference prior argument. In 
contrast, (b) was a rather ad hoc modification. Hence (a) should be the 
preferred choice for the actual model we consider. Again, however, it can be 
difficult in more general hierarchical models to know what to use in place of 
I, and choice (b) does not require this additional input. 

A very useful generalization (e.g., in common meta-analysis situations) 
would be to consider the setting 



independently for i = l,...,m, where the Sj are known positive-definite 
matrices, the Zj are given k x h covariate matrices and f3 is now h x 1. 
Reasonable adaptations of the priors discussed above are: 

1. Replace the covariance matrix I in the definitions of the priors for V 
by S = -^Y^iLi^i- (Again, this is not necessary if one uses the prior 



[ivr^-D/^n^K-d,)]" 1 .) 

2. Replace the prior in (5) by tt((3) = [1 + /3*Z*Z/3]~ (/l ~ 1)/2 , where Z is the 



matrix (z* z| • • • z^J* . 

The results in the paper almost certainly go through for the generalization 
to known Sj. We would also guess that the results are true for the general- 
ization to covariates (the extension was true for the case k = 1, as shown in 
[7]), but the technical details in establishing this appear to be formidable. 
Finally, a number of the computational strategies mentioned in Section 1.5 
are adaptable to these generalizations, but we do not have experience in 
utilization of such adaptations and so cannot comment on their efficiency. 

2. Posterior propriety and impropriety. 

2.1. The marginal distribution. Posterior propriety and admissibility prop- 
erties are determined by study of the marginal density of X, given by 



X i ~JV fc (fl i ,S i ) 



e i ~JV fc (z i) 9,V) 





where 
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1 

2 

1 1 

~ (2vr)P/ 

(9) 



/ol — r exp ( -- V (Oi - /3)*H*D~ 1 H(6> i - /3) | . 
(27r)P/ 2 |D| m / 2 I 2^-T ; / 



i=l 



Notation al convention. It will be useful to write 

(10) m(x)«0(x) 

if there exist Ci > and C2 > such that, Vx, 

(11) C7 l5 (x)<m(x)<C 25 (x). 

(This is related to the notion of "credence," as defined in [29].) Thus, under 
Condition 1 we can write 



dD dHd(3d6. 



m(x) « J J J J /(x|0)7r(0|/3,H,D)7r(/3) 

^ 12) x [n t<i (^-^)]^[ rfl >d 2 >...> rffc] 

II + DK a 2- a i)|D| ai 



Standard calculations yield the following expressions for m(x) for the 
various cases of 7r(/3), where we define x = x «- 

Lemma 2.1. For vr(/3) = 1 {Case 1 scenario) and m>2, the marginal 
density of X satisfies 



mix oc 



1 



II + Dlt™- 1 )/ 2 
1 
2 



exp ^-i J^(xi - x)*H*(I + D)~ 1 H(x i - x) j vr(H, D) dD dU 

[Ui<j{ d i - dj)] l I[d 1> d 2 >-- >d k ] 
|I + DJ [oa-oi+(m-l)/2] |D| a i 

x exp ( -- ^(x, - x)*H*(I + D)" 1 H(x i - x) J dD dH. 



2 i=i J 
(13) 

When m=l, the marginal density of X does not exist if 7r(D) /ias infinite 
mass. 
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Lemma 2.2. If 7r(/3) is 7Vfc(0,A) (the Case 2 scenario, where we set 
(3° = for convenience), the marginal density o/X is 



m(x) OC / / -, -TJ7T, 

y J J J I + DK" 1 - 1 )/ 2 I + D 



(14) 



+ mHAH*|V 2 
x exp ("^ f>< ~ x) t H t (I + D)^ 1 H(x 4 - x)j 

x exp ^-im(Hx)* (I + D + mHAH') _1 (Hx)j vr(H, D) dD dK 

[Ili<j(^ - dj)] l I[ dl> d 2 >->d k ] 

|I + B\[aa-ai+(m-l)/2] |I + D + mHAH* j 1 / 2 ^!" 1 

x ex p ("^ £ (xi _ * )<h<(i + d ) _ih ( x * - x ) J 

x exp f--m(Hx)* (I + D + mHAH*) _1 (Hx) j dD dU. 



Lemma 2.3. For ir({3) = A^.(0,AA) (the Case 3 scenario, where we set 
(3° = /or convenience), where vr(A) satisfies Condition 2, the marginal 
density o/X is 



m(x) oc 



(15) 



1 



|I + D|( m_1 )/ 2 |I + D + m AH AH* I 1 / 2 

x ex p (-^ f>* - x )* H *( I + °)~ lH ( x i - x ) 

x exp^-im(Hx) < (I + D + mAHAH*)~ 1 (Hx 
x 7r(H, D)7r(A) dX o!D dH 

[T\i<j(dj - dj)] l I[d 1> d 2 >-->d k \ 

|I + DlN-^i+C" 1 - 1 )/ 2 ] |I + D + mAHAH*! 1 / 2 ^!" 1 

x exp (-± ^(xj - x)*H*(I + D)~ 1 H(x i - x)^ 
x expf--m(Hx) t (I + D + mAHAH*) _1 (Hx) 



x ■K(\)d\d~DdYl. 



2.2. Impropriety of the posterior. The next several theorems discuss the 
conditions under which the posterior distribution is proper. The following 
two lemmas are used. 
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Lemma 2.4. If a k x k matrix H is orthonormal, hj(x), i = 1,2, .. . ,m, 
are vector-valued functions, and A is positive semidefinite, then 



0<exp(-I]T|Mx)|| 2 

(16) 



i=l 



< exp(^-i^(h 4 (x))'H t (I + D + A)- 1 Hh. i (x)J < 1. 

Proof. The upper bound is clear. On the other hand, H*(I + D + 
A) H < I since H is orthonormal and d{ > 0, so that 

||(h i (x))*H*(I + D + A)- 1 Hh 4 (x)|| < Hh^x)!! 2 , 

which yields the lower bound in (16). □ 

Lemma 2.5. Let p\ and p/, be the maximum and the minimum eigen- 
value of A, respectively. Then 



I + D| < |I + D + mp k l\ < |I + D + mHAH*| 
< |I + D + mpiI| < (1 +mpi) fe |I + D| 



and 



. , x* (I + D + mpil) _1 x < x* (I + D + mHAH*) _1 x 

^x^I + D + m^ir'x. 

Proof. Using the notation A < B to denote that B — A is nonnegative 
definite, we have 

(19) Pk l < HAH* < pil, 

since A is nonnegative definite and H is orthonormal. Hence, 

(20) I + D + mp k I < I + D + toHAH* < I + D + mpil, 
from which (17) follows directly. From (20), clearly, 

(I + D + mpiI)" 1 < (I + D + mHAH*)" 1 < (I + D + mpkl)" 1 . 
Equation (18) follows immediately, completing the proof. □ 

Now we give the conditions under which the posterior distribution is 
proper for each of the three cases of vr(/3). 
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2.2.1. Case 1 scenario. Since we are only considering improper vr(V), 
Lemma 2.1 shows that we need to consider only m>2. 



Theorem 2.6. If ir((3) = 1, m > 2, k > 2, and 7r(H,D) satisfies Con- 
dition 1, then the posterior distribution exists if and only if a\ < 1 and 
a 2 >^ + (k-l)l. 



Proof. The posterior distribution is proper if and only if < m(x) < 
oo. The lower bound is clearly satisfied, so we only need to consider the 
upper bound. From (13) and Lemma 2.4, it is clear that, with x considered 
fixed, the posterior exists if and only if 



(21) 



?n(x) 



[YIi<j(dj - dj)] 1 1[d l >d 2 >- >d k ] 
II + D|[ a2-ai+ ( m-1 )/ 2 ] |D| ai 



dD < oo. 



To determine necessary conditions for (21) to hold, first fix d\, d 2 , ■ ■ ■ , dfe-i 
and consider the integral over d k in (21), which is 



C 



fo-k-l 

Jo C(l 



(l + 4)[o2-ai+(m-l)/2] 



k-1 



~[(di - d k ) 



dd k . 



Clearly, 



4 1 ( 1 + 4) [a2_ai+(m " 1)/2] 



k-1 



1 (di - d k 



L i=l 



C A 

as d k 



and, when a\ > 1, 



L 



dk-i I 

d 



ai dd k = oo. 



It follows that a necessary condition for (21) to hold is a\ < 1. 
Next, fix c?2, ds, . . . , dk and consider the integral over d\ in (21) 



C 



1 



d 2 < 1 (l + dl)[ a 2- a l+( m - 1 )/ 2 ] 



1 I 



J(di-di 



i=2 



dd\. 



Counting the orders of d\ for both the numerator and the denominator in 
the integral above, we see that this integral is infinite when (k — 1)1 — (a 2 + 
(m — l)/2) > —1. Thus another necessary condition for (21) to hold is 

3 — m -t \ i 

«-2 > — ^ h (fc — 1)/. 
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Next we show that the conditions given in the theorem are sufficient. 
Since 0<l< 1, 

[Ui<j(di ~ dj)} l I[d 1 >d 2 >->d k ] ^ p 
|I + D|[ a2 ~ ai+ ( m_1 )/ 2 ] |Dj ai 

</ [rw^ji; dB 

II + D|[ a2_ai+ ( m ~ 1 )/ 2 ] |D| ai 



(k— a 



~ 2 7o (1 + d .)[a 2 -a 1 + (m -l)/2] 

Since ai < 1 and (fc — i)l > 0, it is clear that each of these integrals is finite 
near 0. For d% near infinity, the corresponding integral is finite if (3 — m)/2 + 
(k — i)l < 0,2- This is true for all i under the condition of the theorem, 
completing the proof. □ 

2.2.2. Case 2 scenario. 

Theorem 2.7. If (5 ~ 7V fe (0, A), > 2, and 7r(H,D) satisfies Condition 
1, i/ien i/ie posterior distribution exists if and only if a± < 1 and 02 > 1 — 
? + (*-!)*• 

Proof. Clearly, we only need to find the necessary and sufficient con- 
dition for m(x) < 00. From (14) in Lemma 2.2 and Lemma 2.4, it is clear 
that 

y J J J |I + D|[ a 2- a i+( m - 1 )/ 2 ]|I + D + mHAH*| 1 /2|D| a i 

Again letting p\ and pk denote the maximum and minimum eigenvalue of 
A, it follows from (17) that m(x) < 00 if and only if 



11 + DK a2 ~ ai+m / 2 )|D| ai ' 1 i dl>d2> '" >d k] 



■ I\d,->do~>-->d,J ■ d~D < 00. 



The proof then proceeds in identical fashion to that of Theorem 2.6. □ 
2.2.3. Case 3 scenario. 

Theorem 2.8. Suppose that (3~Nk(Q,\A), k>2, tt(A) satisfies Con- 
dition 2 and 7r(H,D) satisfies Condition 1. The necessary and sufficient 
conditions for the posterior distribution to exist are a± < 1, a2 > 1 — ^ + 
(k - 1)1 and b> 1 - |. 
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Proof. As in the proof of Theorem 2.7, it is clear that 
i \ fff [Ui<j{di - dj)] 1 ■ I[d 1> d 2 >->d k ] r\w\,m,jTj 

m(x) *7i/ |i + D |( a2 - ai+(m -i)/2) |j _|_ p _|_ ^ahah^PIdF " (a) dXdB dU 

Using (17), it is clear that the posterior density exists if and only if 
fin\ t ff [Ili<j(di — dj)] 1 • I[d 1 >d2>->d k ] /u n , nx 

(22) I = JJ |I + p|( Q2 - ai+ ( m -i)/2) |D h | (1 + cA)I + D |i/2 n ( ^ dX rfD < °°- 
Clearly, 

T> f 1 / f [Ui<j(di ~ dj)] l I[ dl>d2> ... >dk] 1 

-7o U |I + D|N^i+(™-i)/2]|D|«i|(l + cA)I + D| 1 /2 aiJ / 7r ^ clA 

> f 1 / / [IIi<j(rfi ~ dj)} l I[ dl> d2>->d k ] \ , , 

-Jo \J |I + D|N^i+(™-i)/2]|D|«i|(l + c)I + D|V2 aiJ / 7r ^ jaA 

>C f Rli<j( d i ~ dj)] l I[ dl>d2> - >dk ) dD 

~~ J |I + D|( a2 - ai+m /2)|D| ai ' 

the last inequality holding because of Condition 2(i). Proceeding as in The- 
orem 2.6, a necessary condition for I to be finite is 

TTl 

a\ < 1 and 02 > 1 — — + (k — 1)1. 
On the other hand, by (ii) of Condition 2, 

l> f(/l JMf^M 1 ^^ rrr 2 dBUx)dX 

- Ji \J |I + D|[ a 2-«i+(™-i)/2]|D| a i|(l + cA)I + D| 1 /2 / 1 

> r r 1 l rn f [UiKjjdi - dj)} l I [dl>d2> ... >dk] 
~ Ji (l + cX) k / 2 'x b J |I + D|(^-ai+m/2)| D |«i 

This integral is infinite when b < l — k/2. So another necessary condition for 
(22) to hold is b> 1 - fc/2. 

Next let us prove that the conditions are sufficient. Using 

fc 1 1 

n (1 + CA + dj) 1 / 2 ~ (l + CA + d 1 ) 1 /2(i + CA)( fc - 1 )/ 2 ' 

we have 

T< [[ [IL<j(^ ~ rf j)]' J Ml>rf2>->rffc] n , n 

-JJ \I + D|[^-ai+(m-i)/2] (1 + CA + di)V2(i + CA)( fc - 1 )/ 2 |D|- 1 ?rl ' 

f f (IT di) 1 

-J J |I + D|[^-i+(— 1)/2] (1 + c\ + di)V2(i + CA)^- 1 )/ 2 ^!-! 7r(A) dA dD ' 
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As in the proof of Theorem 2.7, the integrals over d<i ■ ■ ■ d p are finite under 
the stated conditions, so that 



KC 



d [(*-l)!-ai] 



(1 + dl )[a 2 -a 1+ (m-l)/2] (l + C \ + d X f/ 2 {l + CX)^/ 2 ^ ^ ^ 

<cj J {I + ,ii)~ [a2+(?n ~ 1)//2 ~ (fc " 1)/] 

x (1 + CA + di)" 1/2 (l + CA)- (fc - 1)/2 ^(A) dXdd ± . 

Break this integral up into four integrals over (0,c) x (0,c), (0,c) x (c, co), 
(c, co) x (0, c) and (c, co) x (c, co). Bounding the first three integrals is easy, 
using Condition 2. The last integral is bounded as in the proof of Lemma 1 of 

[7]. 

□ 



2.2.4. Summary of posterior propriety and impropriety. The cases of 
most interest are I = and 1 = 1. The following corollaries of Theorems 
2.6, 2.7 and 2.8 deal with these cases. 

Corollary 2.9. Suppose I = and k>2. 

(a) In the Case 1 scenario (vr(/3) = 1), when m>2, the posterior distri- 
bution exists if and only if a\<\ and 02 > ■ 

(b) In the Case 2 scenario (j3 ~ Nk(0, A)), the posterior distribution ex- 
ists if and only if a\ < 1 and 02 > 1 — y . 

(c) In the Case 3 scenario (/3 ~ Nf.(0, AA), A~7r(A)), the posterior dis- 
tribution exists if and only if ai < 1, 02 > 1 — y and 6 > 1 — |. 

Corollary 2.10. Suppose I = 1 and A; > 2. 

(a) In the Case 1 scenario (vr(/3) = 1), w;/ien m > 2, Z/ie posterior distri- 
bution exists if and only if a\ < 1 and a2> k — ■ 

(b) In Z/ze Case 2 scenario {j3 ~ iVfc(0, A)), Z/te posterior distribution ex- 
ists if and only if a± < 1 and 02 > /c — y . 

(c) In Z/ie Case 3 scenario {f3 ~ A^(0, AA), A~7r(A)), Z/ie posterior dis- 
tribution exists when a\ <1, 02 > — y and 6> 1 — |. 

It follows that the most commonly used objective priors for covariance 
matrices cannot be used in the hierarchical setting. The nonhierarchical in- 
dependence Jeffreys prior [I = l,a\ = 02 = {k + l)/2] and the nonhierarchical 
reference prior (I = 0,ai =02 = 1) yield improper posteriors. The constant 
prior (Z = 1, aj = 02 = 0) yields a proper posterior only when 2k < m — 1 
for Case 1, and when 2k <m for Case 2 and Case 3. This implies that the 
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number of blocks m has to be at least 2k + 2 for Case 1 and 2k + 1 for Case 
2 and Case 3. 

In contrast, the hierarchical independence Jeffreys prior [/ = 1, a\ = 0, 02 = 
(k + 1) /2] yields a proper posterior when m> k for Case 1 and m > k — 1 for 
Cases 2 and 3, considerably weaker conditions. Furthermore, the hierarchical 
reference prior (a) (I = 0, a\ = 0, 02 = 1) and the hierarchical reference prior 
(b) [/ = 0, a\ = a2 = (2k — I) /(2k)] always yield a proper posterior, except 
when m = 1 in Case 1. 

It is quite surprising that posterior propriety for the hierarchical reference 
priors does not require m to grow with k (as is necessary for the hierarchical 
independence Jeffreys prior). One needs on the order of m = k blocks in 
order for the hyper-covariance matrix V to be identifiable, which is usually 
viewed as being equivalent to posterior propriety. Such equivalence is clearly 
not the case here; in the simplest Case 1 scenario, for instance, only m = 2 
blocks are needed for posterior propriety of the reference priors, regardless 
of the value of k. 

To understand why this is so, consider the transformed version of the 
problem in Section 1.4. Note that the domain of H is a compact set and the 
reference prior assigns a proper uniform distribution to this set, so the only 
parameters that intuitively need data to have proper posteriors are f3 and 
D. These vectors consist of 2k unknowns, which intuitively can be handled 
by the 2k coordinate observations corresponding to m = 2. This general 
posterior propriety is a very attractive property of the hierarchical reference 
priors in that it is often difficult in complicated hierarchical models to ensure 
that conditions such as m > k are satisfied at all levels and components of 
the hierarchy. 

3. Admissibility and inadmissibility. 

3.1. Introduction. In this section we give conditions under which the 
hierarchical Bayes estimate S n (x) (the posterior mean) of is admissible 
and inadmissible for quadratic loss (2). We restrict consideration to the 
priors for which 1 = 0, since these are the priors we will recommend and 
analysis for / > requires different techniques. 

Our study utilizes the following powerful results from [9]. Define 



where (fr(-) is the uniform probability measure on the surface of the sphere 
of radius r = llxll. 



(23) 




(24) 
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Result 3.1. If S n (x) — x is uniformly bounded and 

/oo 
[r mfc - 1 m(r)]- 1 dr = cx3 

for some c > 0, then ^(x) is admissible. 

Result 3.2. If 

/oo 
r 1_m *m(r) dr < oo 

for some c > 0, then ^(x) is inadmissible. 

3.2. Preliminary lemmas. The following lemmas are needed. 

Lemma 3.3. (a) If a < 1, r + a>l and ci and c 2 are positive constants, 
then 

(27) /(«)= /°% ^— — expf - - ) dd^Cimm{C 2 ,v 1 - r ~ a }, 

Jo (a + d) r d a y \ 2{c 2 + d)J i w /' 

/or some positive constants C\ and C 2 . 
(b) If a > — 1, n> and v > 0, £/ien 

(28) g(p,v)= r t a e- vt dt<Cmm{v-^ a+1 \^ a+ ^} 

Jo 

for some positive constant C. 
For the proof see the Appendix. 
Lemma 3.4. Assuming the integrals exist, 

(29) 1 1 5 (H'DH)/ [dl>d2> ... >dfe] dBdH = y JI ff (H*DH)dDdH. 

Proof. Suppose that di > cfo > ■ • ■ > d^ > are eigenvalues of V and 
(d*, c?2) • • • j ^fe) i s a different ordering of (d±,d 2 , ■ ■ ■ , d&). Let D* = diag(dj, d|, . . . , d£). 
Since there exists an orthonormal matrix H* such that D = H**D*H*, it 
follows that 

J J 5 (H*DH)/ [(il>d2> ... >dfc] dDdH 

(30) = JJ 5 ((H*H)*D*(H*H))/ [dl>d2> ... >dfc] dDdH 
= J J 5((H*H)*D*(H*H))J* dD*dH, 
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the last step following from the change of variables from D — > D* (which 
has Jacobian 1), where I* corresponds to the new ordering. Next, note that, 
since dH represents the invariant Haar density, 

J j ff((H*H)*D*(H*H))J* dD*dH = J J s(H*D*H)J* d~D* dH. 

Hence / g(H*D*H)/<iD is the same for any ordering I of the eigenvalues, 
and the result follows since there are k\ orderings. □ 

Notational convention. We need to generalize the notation in (10). 
Indeed, let 

(31) m(x)~g(c,x) stand for g(c,x) < m(x) < g(c',x) 

for some (possibly vectors) c and c'. For instance, in (33) below, c = (Ci, C2, C3) 
The earlier notation was the special case where <?(c, x) = cg(x). 

We conclude this section with presentation of needed upper and lower 
bounds (using the w notation) for the marginal densities in Cases 1, 2 and 
3. 

Lemma 3.5. In the Case 1 scenario and with I = 0, 
m(xl ~ C ' 



|I -\- J3|("i— 1)/2+02— 01 
x exp 5^(xj - x)*H*(I + D)" 1 H(x J - x)^J • dT> dH. 
(32) 

Proof. This follows directly from (13) in Lemma 2.1 and Lemma 3.4. 



□ 



(33) 



Lemma 3.6. In the Case 1 scenario and with I = 0, 

m(x) ~ C\ ( f ; ; ; 7- 

W J J D «i C 2 I + D \m/2+a 2 - ai 



x exp[ --^x*H t (C7 3 I + D)- 1 Hx i ] -dDdH. 



i=l 

Proof. From (14) in Lemma 2.2 and Lemma 3.4, 



m( x )«yy | D i 01 | I+D | (m _ 1)/2+02 _ ( 



|(m-l)/2+oa-ai |I + D + mHAH*|V2 



1 III 

(34) xexp( --^(x 4 -x)*H t (I + D)- 1 H(x J -x) 

i=i 

x exp^--m(Hx)*(I + D + mHAH*)~ 1 (Hx) ) dBdll. 
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Applying Lemma 2.5 to (34), 
m(x) < C 



1 



|D| ai |I + D|( m - 1 )/ 2+a2 - ai |I + D + mpfelj 1 / 2 

x ex p (-^ X>* - x )* H *( I + °r lH ( x * - x )) 

x expf--m(Hx)*(I + D + mpiI) _1 (Hx)^ dVdU 
1 

|D| ai |I + Y)\(m-l)/2+a 2 -a 1 |J ZjZ D|l/2 
/ 1 m 

x exp £(xj - x)*H*(I + D + mp 1 I)- 1 H(x 4 - x) 
x expf-im(Hx)*(I + D + mpiI) _1 (Hx)^ dT>dH 



<C 



|D|ai |J _|_ D|m/2+ci2-ai 

x exp^-^^x|H*(I + D + mpiI) _1 Hx^ dDdH. 



Similarly, 

m(x) > C / / — ^ 

W - J J D ai I + D + rr 



mpiI| m /2+a 2 -ai 



x exp^--^x*H'(I + D)~ 1 Hx i J dDdH. 



This completes the proof. □ 
;mma 3.7. 
m(x) « Ci 



Lemma 3.7. In the Case 3 scenario and with I = 0, 

1 



(35) X eXp 



D|«i |I + D|( m ~ 1 )/ 2 + a 2"«i |(i + (7 2 A)I + D|V2 
f)(xi - x)*H*(I + D)~ 1 H(x 4 - x)) 

x exp^-im(Hx)*[(l + C 3 A)I + Dj-^Hxjj 
x vr(A)dAdDdH. 



Proof. From (15) in Lemma 2.3 and Lemma 3.4, 

m(x) « yyy - - D | (m „ 1)/2+a2 _ ai ^ - D - mAHAH< 
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x exp ^-1 f> 4 - x)*H*(I + D)" 1 H(x i - x)^ 

x exp^-im(Hx)*(I + D + mAHAH')- 1 (Hx)^ • vr(A) dX dD dU. 

The proof is then exactly like that of Lemma 3.6. □ 

3.3. Uniformly bounded property. Let 5 7r (x) be the posterior mean of 6 
with respect to the posterior distribution. To prove admissibility by Brown's 
results, we first need to show that S n (x) — x is uniformly bounded. Let 
<T(x) px i = (x), 81 (x), . . . , <5*(x))*, so that 8J (x) is the subvector of <T(x) 
corresponding to 9{. By symmetry, it is clearly sufficient to show that 5^(x) — 
xi is uniformly bounded. 

Lemma 3.8. Suppose that zi,Z2,...,z m are kxl vectors and y is the 
k x 1 vector (y 1 , y 2 , . . . , yuf , with y n = (£dLi z in) 1/2 > where Zij is the jth 
element of Zj . Define 

C ( 1 m \ 

5 (c,D,zi,z 2 ,...,z m ) = | D | U | g2 j + D |i, exp (^~2 E Z *( C '3 I + D ) ^J' 

where the Ci are positive constants. If u + v > 1 and u < 1, then 
/||(I + D)~ 1 y||g(c,D,zi,z 2 ,...,z m )dD 
/5f(c',D,zi,z 2 , . . . ,z m )eZD 
is uniformly bounded over zi, z 2 , . . . , z m . 

Proof. In (36), 

Ci 



| Numerator | < / ||(I + D) _1 y| 



|DMC 2 I + D|^ 



x ex P {A E z ^ C 3l + D) -1 ZjJ dD 



m 



For each n we will bound the /c-dimensional integral. If j ^ n, by Lemma 
3.3 with a = u and r = v it is clear that 
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<Cimin C 2 , 



'1.1 



,i=i 



If j = n, applying Lemma 3.3 with a = u and r = v + 1 [again using the fact 
that (1 + d n )/(C2 + d n ) is uniformly bounded] yields 

Cl L T+dj • *n& + dj y exp {- m+w g 4 J ^ 



,2 I 



.1=1 



Therefore, 
| Numerator | 



fc 

<E 

n=l 

In (36), 



[y n |C(minj C' 2 , ( ^ 



-u—v ■ 



'in 



>l[C 1 mmlc 2 ,[J2 



2 5 



i=l 



Denominator = C[ J 



1 m \ 



Applying the lower bound of Lemma 3.3, with a = u and r = v, yields 

1 ( 1 m \ 

\ l~u—v\ 



C'i 



■exp -; 



>C?min I CI [J2 



,i=l 



Thus 



Denominator > min< C|, I V] 



j=i . 



1— n— v • 



, i=l 



Combining the numerator and the denominator, we have 
J H*(I + nyiygjc, D, zi , z 2 , ■ ■ ■ , z m ) riD 



/g(c',D,zi,z 2 ,...,z m )dD 
2/ n |C(minjc£, [f] 



\n=l L 



2 1 



,i=l 



x nCimin C 2 , ( £ 

j^n (. \ j=l 



2 I 
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( k 

n 

\j=i 



{/ m \ l-w-W"! "I \ -1 



C\ min{C 2 *, (E™ i ' }} n C{ min{C|, (E™ i 4) 1 ""-"}, 



E 



71=1 



Clearly 



<C. 



Using the condition u + v > 1 , we have that for large E^Li z l 
\y n \C[Tam{C' 2) (YT=x4n)- u - v } 



Vn 



Cimm{CUY™izlY- u - v } 

behaves as \y n \/y^, while for small y 2 it behaves as C3 1 | < C4, so that the 
ratio is clearly uniformly bounded. Thus 

J||(I + D)- 1 y|b(c,D,zi,z 2 ,...,z m )a!D 



/5(c / ,D,zi,z 2 , . . .,z m )dD 



completing the proof. □ 



Theorem 3.9. Assume that ir(/3) = 1, m > 2, k > 2, and vr(H, D) satis- 
fies Condition 1. Also suppose that we choose 1 = 0. If a\ < 1 and a 2 > 3 
£/ien 5 7r (x) — x is uniformly bounded. 



2 > 



Proof. We only need to show that <JJ( X ) — xi is uniformly bounded. It 
is well known that 



(37) 



5J(x)-xi = (Vm(x))i/m(x), 



where V denotes the gradient. Exactly as in the proof of Lemma 3.5, it can 
be shown that 



IKVmCx))! 



(I + H'DEQ-^xi -x) 

ii + dK" 1 - 1 )/ 2 



< 



x ex p y- 2 E( x * - x ) iH< ( 1 + D r lH ( x * - x 

x 7r(H,D) dH dD 

IKl + D^Hfo-x)!! 
I J _|_ J)|(m-l)/2+Q2— a,i 
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x exp^-i^(x i -x)*H*(I + D)- 1 H(x i - x) j -dDdH. 

Hence, defining Zj = H(xj — x), and using the lower bound in Lemma 3.5 
for the denominator in (37), one obtains (for appropriate constants c, c') 

//||(I + D)- 1 z 1 || 9 (c,D,z 1 ,z 2 ,...,z m )dD(iH 
\\o 1 (x) -Xi|| < — 



<C 



JJ g(c', D, zi, z 2 , . . . , z m ) dD dH 

J/||(I + D)- 1 y|lg(c,D,z 1 ,z 2 ,...,z m )dDdH 
// g(c', D, zi, z 2 , . . . , z m ) dD dH 



where y and g are as in Lemma 3.8, with u = a\ and v = a 2 — a\ + (m — l)/2. 

Now Lemma 3.8 shows that, if u + v = a 2 + (m — l)/2 > 1 and u = a± < 1, 
then 

y ||(I + D)~ 1 y||5(c,D,zi,z 2 ,. .. ,z m )dD < K J g(c' , D, z u z 2 , . . . , z m ) dD. 
Hence ||5i(x) — xi|| < K and the theorem is established. □ 

Theorem 3.10. Assume that vr(/3) is N k (/3°,A), k>2, and vr(H,D) 
satisfies Condition 1. Also suppose that we choose 1 = 0. If a\ < 1 and a 2 > 
1 — y> then ^(x) — x is uniformly bounded. 

Proof. The proof is very similar to that in Theorem 3.9: 

||(Vm(x))i|| < J j ||(I + V)- 1 (x 1 -x) + (I + V + mA)- 1 x|| 

1 



|I + D|[ a 2- a i+( m " 1 )/ 2 l |I + D + mHAH'l 1 / 2 ^! 01 

/ 1 m \ 
x ex P ( " 2 E( x * " ^'H^ 1 + D )" lH ( x i " x ) ) 

x exp f-^m(Hx)'(I + D + mHAH 1 )" 1 (Hx)j dD dH. 



Note that 
(38) 



(1 + V)" 1 (xi -x) + (1 + V + mA) _1 x|| 

< + v)- 1 ^! + ||(i + v)- 1 *!!. 



One now proceeds as in the proof of Theorem 3.9 with each term of (38), 
making use of Lemma 3.6 and arguments similar to the proof in that lemma. 
□ 
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Theorem 3.11. Assume that -n (0) is N k ((3 , AA), k > 2, 7r(H, D) satis- 
fies Condition 1 andir(\) satisfies Condition 2. Also suppose that we choose 
1 = 0. If a\ < 1, ci2 > 1 — y and b > 1 — -|, i/ien ^(x) — x is uniformly 
bounded. 

Proof. Define ^(xIA) to be the posterior mean with A given. From 
Theorem 3.10, we know that 

sup ||<T(x|A) - x|| = K(X) < oo. 

x 

With a modification of the proof of Theorem 3.10, it can be shown that 
K(\) is continuous. Also, as A — » oo, the posterior distribution converges 
to that corresponding to 7r(/3) = 1, so we know from Theorem 3.9 that 
lim^oo .RT(A) < oo. As A — > 0, the posterior converges to the special case 
of Theorem 3.10 in which A = 0, so we know K(0) < oo. It follows that 
K(X) is itself bounded. Finally, letting 7r(A|x) denote the posterior distribu- 
tion of A given x, which was shown to exist under the given conditions, it is 
clear that 

||<T(x) -x|| 2 = ||£^ A W[<T(x|A) -x]f 

< £^ A W ||<T(x|A) - x|| 2 < E< X W [K(X)] 2 . 

Since K(X) is bounded, it follows that ^^(x) — x|| is uniformly bounded, 
and the proof is complete. □ 

3.4. Admissibility and inadmissibility results. To prove admissibility or 
inadmissibility based on Results 3.1 and 3.2, we need only determine whether 
(25) is infinite or (26) is finite. Since Lemmas 3.5, 3.6 and 3.7 provide ef- 
fectively equivalent upper and lower bounds on m(x), it suffices to evaluate 
(25) and (26) for these equivalent bounds. 

3.4.1. Case 1 scenario. 

Theorem 3.12. Assume that tt(/3) = 1, m>2, a\ < 1 and 7r(H,D) 
satisfies Condition 1 with 1 = 0. If k = 2 and ai > 1, then the posterior mean 
is admissible under quadratic loss. If < 02 < § — t, then the posterior 
mean is inadmissible. 

Proof. Let z* = (z a , z i2 , z ik ) = H(xj - x). Define y] = YT=\ z ij- B Y 
Lemma 3.5, 
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x exp 



^-i f>* - x)*H*(I + D)- 1 H(x i - x)j • dD dH 



C 



n 



[ df (1 + ^.jCm-iVa+oa-oi exp V 2(1 + d,) 



/ k Z" 00 1 
ny df(iT^ 



(1 + cL-)(" l - 1 )/ 2 + a 2-ai 



cxp 



2(1 + dj) 



dDdH 



dd. 



dH. 



Applying the upper bound of Lemma 3.3 with r = (m — l)/2 + 02 — a\ and 
a = ai yields 



dl^l + dj)^- 1 )/^ 



exp 



2(1 + dj) 



ddj 



Cxmin{C2,(y]f- m )/2 " a2 }- 



Thus 
(39) 



m(x) « C J 



nC 1 min{C 2 ,( 2 /])( 3 - m )/ 2 - a2 } 



dH. 



To prove admissibility, note that 



m{r) 



(40) 



{x:| 



<c 



m(x) d(p(x) 



x||=r} 



J Cimin{C 2 , ( yj 2 )( 3 - m )/ 2 - a2 }d0(x) 

{x: ||x||=r} J - =1 



dH. 



The inner integral, with respect to (j), is essentially considering x to be 
uniformly distributed on the surface of the sphere of radius ||x|| = r. Since H 
is an orthonormal matrix, ((Hxi)*, (HX2) , • ■ • , (Hx m )*) also has a uniform 
distribution on the surface of the sphere of radius r. From the result in 
Section 49, Subsection 1, of [24], it follows that, for each given H, 



Thus, 

m(r) < C 



1 



m — 1 m — 1 



i=l 



V 2 



m — 1 
2 '2 



y « , <r2 .n^{c 2 ,fo. 2 ) (3 -"" /2 -'"} 



fc / y 2 X (m-3)/2 



n 

i=l 



(fc-2)/2 



dH. 
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nner integral is clearly constant over H and can be dropped, along 
with the factor (1 — J2i=i yf / r2 ) (since k > 2). Then elimination of the range 
restriction on the yf yields 



,2\(m-5)/2 , 2 



K poo f m 

m(r) < Cj[ j o mm{C 2 , ( % 2)[(3-m)/2-a 2 ] } . ^ 

^ roc 

< Cr- k ^ J] I [C^ + {y f r ^-rn)/2-a 2]r l iy 2 )im -3)/2 ^ 



i=l 



the last inequality using the fact that min{C 2 ,v} < 2(C 2 ~ 1 + u -1 ) -1 . The 
final integrals are finite if m > 2 and a 2 > 1, so then 



m(r) < Cr 



-k(m-l) 



Hence 



/•oo /*oo 1 

J [r mk ~ l m{r)Y l dr > y 



which is infinite if k = 2. Since the conditions A; = 2 and 02 > 1 also imply 
that ^(x) — x is bounded by Theorem 3.9, the proof of admissibility using 



Result 3.1 is complete. 

inadmissibility, note from (39) that 



To prove 



(41) 



m(r)= f — r - 

J{x: ||x||=r} m{K) 



#(x) 



< 



/{x:||x||=r} 



' J] d min{C 2 , ( y 2)(3-m)/2-« } d H 

7 = 1 > 



X 



Note that 



J /(H)cffl] 1 < I [/(H)]- 1 dH if /(H) > 0, 



so that 

m(r) < C\ 
< Cl J 

(42) 



( fl min{C 2 , (y2)(3-™)/2-a 2} \ #(x) 

{x: ||x||=r} \ j=1 ) 
k 



L^{x:[|x||=r}^ 



]max{C 2 ,(y|) a2 -( 3 - m )/ 2 }#(x) 
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Continuing exactly as in the proof of admissibility (but employing the bound 
yf < f 2 ) yields 

k r 2 /„2\(m-3)/2 /,2 

m(r)<CY[J o max{C 2 , (y4 2 )--( 3 -)/ 2 }(|) 



(43) fc rr 2 /„2\(m-3)/2 /,.2 

<^n/ o [c 2 +( yi 2 r- (3 " m)/2 ](|) 



i=l' 

< (7 + C , 2r fc(2a 2 +m-3)_ 



Hence 



/OO /"OO 
r (i-™fc)^( r ) dr<C + C 2 J A 2ka *-^+i) drj 

which is finite only if a 2 < § — r- If m > 2, ai < 1 and a 2 > (3 — m)/2, then 
^(x) — x is uniformly bounded, and Result 3.2 completes the proof of inad- 
missibility. (It was not strictly necessary to establish the uniform bounded- 
ness condition for inadmissibility, but it is necessary to verify that the poste- 
rior mean exists, and the uniform boundedness condition clearly establishes 
that this is so.) 

□ 

Theorem 3.12 fails to cover the situation in which k = 2 and a 2 = 1 and 
the situation k > 3 and a 2 > § — h- We suspect that the posterior mean 
is also inadmissible in these two situations, but were unable to prove it. 
(The main hurdle is to find a way to avoid use of the too-strong inequality 
[//(H) dEq- 1 < /[/(H)]- 1 dH.) 

3.4.2. Case 2 scenario. 

Theorem 3.13. Assume that ir(/3) is N k (f3°,A), a\<l, k>2 and 
7r(H,D) satisfies Condition 1 with 1 = 0. If a 2 > 1 — 4, then the poste- 
rior mean is admissible. If < a 2 < 1 — ^, then the posterior mean is 
inadmissible. 

Proof. Let z { = (z il ,z i 2, z ik ) = H Xi . Define y] = YT=i z ij- % ( 33 ) 
we have 

m(x) « C\ 



D| ai |C 2 I + D| m / 2+a2_ai 

m 



exp^-i^x*H*(C 3 I + D)~ 1 Hx i J dDdH 
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\ df (C 2 + dj ) -ai 6XP V 2 (C 3 + d,' 



Ci 



n 



(C 2 + d J )-/2+a 2 -a 1 eX P ^ 2 (C 3 + 



dD dH 



dd-; 



dH. 



Applying the upper bound of Lemma 3.3 with r = a 2 — a\ + m/2 to the inner 
integral above yields 



Jo W 



(C 2 + ^) m/2+a2 " ai 



exp 



2(C 3 + d J 



ddi 



<Crmin{C 2 *,(y2)W2-a 2} _ 



Thus 

(44) 



m(x) « 



n^rmin^,^) 1 --/ 2 -^} 
i=i 

To prove admissibility, note that 



dH. 



m{r) 



(45) 



{x:| 



x||=r} 



m(x) d0(x) 



{x: ||x||=r} 3 - =1 



dH. 



The inner integral with respect to is essentially considering x to be uni- 
formly distributed on the surface of the sphere of radius ||x|| = r. Since H 
is an orthonormal matrix, ((Hxi)', (Hx 2 ) 4 , . . . , (Hx m ) )* also has a uniform 
distribution on the surface of the sphere of radius r. From the result in 
Section 49, Subsection 1, of [24], it follows that, for each given H, 



(yj 


4>- 




~ Dirichlet ( — . 


m 




\r 2 




••> r 2 J 


V 2 ' 


y 





Thus, 



771 



■J/7 



2\m/2-l 



d 



y7 



dH. 



I(r) < C / / J] min{C5, (y l 2 ) 1 " m/2 - a2 > XT I 
J J i=l i=l 

Again dropping the integral over H and using the inequality min{C 2 ,u} < 
2(C 2 ~ 1 +v~ 1 )~ 1 results in the bound 



mir ) 



<cu [cr 1 +(i£r (1 - m/2 - a3) r 



i(Vi 



<Cr 



i=l 
km 



2 \ m/2-1 



(CI" 1 + v (a 2 +m/2-l)yl v (m~2)/2 dy 



d 



1)1 
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The order of the integral in the last expression is easily seen to be 0(r 2 ^ a2 ^) 
if a 2 < 1; O(logr) if a 2 = 1; and 0(1) if a 2 > 1. Hence 



r mk - l m(r)\- L dr> { 



C r l-2*(i-oa) drj ifa 2 <l, 
C y r (log r)" fc dr, if a 2 = 1, 

if a 2 > 1. 



C / rdr 



This is clearly infinite if a 2 > 1 — 1/k. By Theorem 3.10, this condition also 
implies that 5 n (x) — x is bounded, so use of Result 3.1 completes the proof 
of admissibility. 

To prove inadmissibility, note from (44) and the fact [//(H)dH] _1 < 
/[/(H)]- 1 dti that 



m(r) 



{x: ||x||=r} m(x) 



#(x) 



< 



/ (7 n ci ^{c* (^) i_m/2 " a2 } dii) #(x) 



|max{C 2 ,(?/|) 



2\a 2 +(m-2)/2 



_J{x: ||x||=r} - =l 

Continuing as with the proof of admissibility, one obtains 



}#(x) 



dH. 



m(r)<cf[ f max{C 2 ,(y 4 2 r 2 -( m - 2 )/ 2 }(% 



i=l 



<cn/ [C2 + (y ; 



2\a 2 -(m-2)/2i ( Vi_ 



2\(m-2)/2 / 2 

4% 



< C + C 2 r 



fc(2a 2 +m-2) 



Hence 



r (l-mk) m r T 



dr<C + C 2 J r ^~ 2k+1 Ur, 



which is finite only if a 2 < 1 — 1/k. If a\ < 1 and a 2 > (2 — m)/2, then 
5 T (x) — x is uniformly bounded and so the posterior mean exists, and Result 
3.2 completes the proof of inadmissibility. □ 



3.4.3. Case 3 scenario. 
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Theorem 3.14. Assume thatir{(3) is N k (/3°, AA), m>2,a 1 <l, vr(H,D) 

satisfies Condition 1 with I = and n(X) satisfies Condition 2. If (i) k > 2, 

a2 > 1 — \ and b > 1; or (ii) k > 3, 02 > 1 — | and < b < 1; or (iii) 
6 
2 

quadratic loss. 



k = 2, 02 > 1 — I and < b < I, then the posterior mean is admissible under 



Proof. Starting with (35) from Lemma 3.7 (setting all constants to 1 
for notational simplicity) yields 



mix 



n 



1 



[ df{\ + d i ) (m-1)/2+a2_ai (l + A + dj) 1 



/■1 



x exp 



.i=l 

x 7r(A)dAdDdH. 



™ (H(x t - x)) 2 
l + d 



(Hx) 

+ m- 



1 + A + do- 



Let 



UK- 



E 

8=1 



(H(x l -x)) J 2 



(Hx) 



x 



x 



J = 1,2,. 



Under ^(x), the uniform distribution on the surface of the sphere of radius 
r = ||x||, by the result in Section 49, Subsection 1, of [24], we have 



(wi, . . . , w k , vi, . . . , v k ) ~ Dirichlet 



m — 1 



m — 1 1 



1 



2 '2'"'2 



Thus, arguing as in previous theorems, dropping H and letting the Wi and 
Vi range freely over (0,1), yields 



m(r) 



m(x) d(J)(x) 



< 



k 

n 



1 /•! /-I 



1 



Jo dfil + dj^-V/^^il + X + dj) 1 / 2 

>,2 



x exp[ - — 



+ 



1 + dj 1 + A + dj 

(to-1)/2-1 -1/2 



u - dvjj dvj ddj >tt(X) dX. 



Make the change of variables Sj = Wj(l + dj) and = uj(l + dj + A), j 
1,2, ... ,k. The region of integration becomes 



R 



St 



0< Sj < 



1 1 

,0 < tj < ,\ 

l + dj J -l + dj + A' 



1,2,...,A;L 
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Then 

m(r) < 



n 



((i + dj) Sj )( m - 3 y 2 ((i + dj + x)t 3 y^ 2 



M I df{l + d^m-^+oa-a^! + A + d )l/2 

x (1 + dj)(l + d, + A) dsj di,- dd,- |tt(A) dA 



< 



(m-3)/2 -1/2 



K st n U a 1(l + d . )a2 _ ai 

x exp^— — [sj + tjA dsj dtj ddj |vr(A) dA 



n 



\{J df{l + dj) a *- a i 



o 



(m-3)/2 

s^- exp 



Si I ds 



!/(!+£% +A) 



*7 1//2 exp( ——tj ) dtj 



ddj ?7r(A) dA. 



Applying Lemma 3.3(b) to the inner integrals above yields 

k 



m(r)<cjf[ J 



1 



(46) 



df (l+dj)^-"! 



x minlr-^-^^l + dj)-^- 1 )/ 2 } 
min{r _1 , (1 + dj + A)~ 1/2 } ddj 



7r(A) dA. 



Consider first the situation b > 1. Then 7r(A) has finite mass and so [using 

(l + dj+A)- 1 /2<(l + d,)- 1 /2] 



m(r) < C 



1 



o d ai (l + d) a2 " ai 



min { r -(™-i) 5 ( 1 + d )-(^-i)/2} 
min{r- 1 ,(l + d)- 1/2 }dd i 



Break up the inner integral into integrals I\ and I2 over (0,r 2 — 1) and 
(r 2 — l,oo), respectively. Then, since a\ < 1, 



1 



1 1 



d a i (1 + d) a2_ctl r^" 1 ) 



dd<Cr~ m (l + r 2 ( 1 - a2 )), 
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1 AA < (~< r 2-2a,2-m 

+ (l+d)(m-l)/2 (l + d)l/2 - 



'2 

Hence 



m(r) < C[7i + I 2 ] fc < Cr~ mk {l + r 2 ^ 1 " 02 )) 

and 

/ 00 [r mfc - 1 m(r)]^ 1 dr> f°° ^- -dr. 

This is finite if a 2 > 1 — 1/k. 

Next consider the case < b < 1 . Clearly 

min{r- 1 ,(l+d j +A)~ 1/2 } 

= min{r-\ (1 + d j + \yV^~b)+e 

x mm{r~\ (1 + dj + A) _1 / 2 } 26_1_£ 
< (1 + xf-l-e/2) min{r -l ; (1 + d . r l/2 } 26-l-e 

Hence (46) can be bounded as 

1 



m(r) < C 



o d ai (1 + d) a 2~ a i 

x minlr-^- 1 ), (1 + d)~^ 1)/2 } 

l(fc-l) 

xmin^^l + d)" 1 / 2 }^ 
1 



/o d ai (l + d) a2 ~ ai 

x minjr-^- 1 ), (1 + d)- {m ~ 1)/2 } 
x minjr- 1 , (1 + d)-V2}(2&-i- £ ) 

using the fact that (1 + A)C , - 1 - £ / 2 )vr(A) has finite mass] . Proceeding exactly 
as in the b > 1 case yields 

m(r) < Cr' 2k ~ 2ka2 ~ km+2{ - l ~ V)+£ \ 

so that 

/■no (.oo 

^^^(r)]" 1 dr>C r (2k-2ka 2+ l-2b+e) ^ 



which is infinite if a 2 > 1 — | + e'. Since e' was arbitrary, the condition for 
admissibility when 0<6<lisa 2 >l — |. By Theorem 3.11 these conditions 
also imply that ^""(x) — x is uniformly bounded, except when k = 2, in 
which case the restriction b > must be added. This completes the proof of 
admissibility. □ 
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3.4.4. Admissibility and inadmissibility for the common priors. Let us 
apply these results to the versions of the reference prior discussed in the 
Introduction. For (3, the Case 1 constant prior leads to admissibility only in 
the case k = 2, and hence is not a prior we recommend. The Case 2 conjugate 
prior can readily yield admissible estimators, and is certainly reasonable if 
backed by subjective knowledge. The Case 3 default prior that was suggested 
in Section 1.2 is 



(47) 7r(/3)cx[l + ||/3|| 5 



-(fc-l)/2 



corresponding to the two-stage prior (3\X ~ N(0, XT), 7r(A) oc \~ l / 2 e~ l ^ 2X \ 
We therefore focus on admissibility results when this prior is used for (3. 

In regard to priors for V, note first that the nonhierarchical reference 
prior for V cannot be considered, since it corresponds to a\ = 1, yielding an 
improper posterior. The modification 

tt(V) 



where a\ < 1, is inadmissible in Case 1 [7r(/3) = 1], but is admissible in Case 
2, and is admissible in Case 3 when b > 1 and 1 — 1/k < a\ < 1, or when 
< b < 1 and 1 — b/k < a\ < 1. Since we recommend (47), which has b = 1/2, 
this suggests the choice a\ = 1 — \/{2k) = {2k — l)/(2k). While we were not 
strictly able to prove admissibility for this choice, it likely corresponds to 
admissibility and, in any case, being at the boundary of admissibility has 
considerable appeal. 

The modified reference prior of the form 

tt(V) 



is admissible in Case 1 if k = 2 and a% > 1; in Case 2 or Case 3 (6 > 1) if 
a2 > 1 — 1/A;; and in Case 3 (0 < b < 1) if 02 > 1 — b/k. The natural choice 
is 02 = 1, since this is admissible for all b and k in Cases 2 and 3, and is 
almost admissible in Case 1 when k = 2. (Recall that we were unable to 
establish admissibility or inadmissibility in this case, but again being at the 
boundary of admissibility has considerable appeal.) Recalling the discussion 
from the Introduction, the recommended default prior distribution of this 
form is thus 

tt(V) 



ll + VHW^-d;) 



This yields a proper posterior with a posterior mean that is admissible in 
estimation under quadratic loss. 
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APPENDIX 



Proof of Lemma 3.3. (a) It suffices to take c\ = c 2 = 1 in the proof. 
This is because {c\ + d)/ (c 2 + d) is uniformly bounded above and below, so 
that one could change (c± + d) to (c 2 + d) , or vice versa. A simple change of 
variables then reduces the expression to the case C2 = 1. Clearly, 

/(v)= r(i+W e,q, ("2(iT^) <w 

< e -/4 [ 1 ±-dd+ [°° JL-exp(-^)dd 



d a J x d r+a * ^ 4d 



1 - a 7i (i r+a V 4d 

Making the change of the variables t = v/d yields 

«.)-A^ + r(ir-(4)^)* 



1 




0. 




1 




1 




a 




1 




1 




a 




1 




1 




a 



1 — a 7o 

< — — e"^ 4 + u 1 - r - a / t(r+a-l)-i e -t/4 dt 
I — a Jo 



Since r + a > 1, it is easy to show that e "/ 4 < Cu 1 r a when t; > 0. There- 
fore 

/(«)<Cit; 1 - r - . 

On the other hand, f(v) is a decreasing function of v when v > 0, so 

r°° 1 

max/H =/(0) = / 7- ir-prid 

f 1 1 f 00 1 1 1 

< / —dd+ ——dd = - + t = C 2 . 

-Jo d a Jx d r + a I- a r + a-1 

Thus, defining C 3 = C 2 /C 1: 

/(t;)<C 1 min{C3,w 1 - r - a }- 
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To find a lower bound for f(v), note that 

/w -r(TrW exp ("^Td))^ 

Making the change of the variables t = v/d, one obtains 

/ W 4f(r-(-0B)- 

If v > 1, then 

/(«) > ^ X ~ T ~ a I] tr+a ~ 2 exp (-|) = C{ 
If 0<u<l, then 

/(u)>/(l)= /°°7 -TT- — expf __J\ = 

Let C£ = min{C{ , C^}. Thus 

/(«) > minjC^C^ 1 -^ } = C[mm{C' A ,v 1 ~ r - a }, 

where C' A = C' 3 /C[, completing the proof of part (a). 

To prove part (b), change variables from t to w = xt. Then 

g^ v )= v -( a + 1 ) / w a e- w dw. 



Now f» v w a e~ w dw < r(o+l) and fg v w a e~ w dw < f£ v w a dw = (/ii;)( a+1 ) /(a + 
1). Hence 

g(H,v) < min|r(a + 1), M^j, 

and the result follows. □ 
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